Hybrid Text Chunking

نویسندگان

  • Guodong Zhou
  • Jian Su
  • TongGuan Tey
چکیده

This paper describes a HMM-based chunk tagger and its extensions used in KRDL for the shared task of CoNLL'2000. Compared with standard HMM-based tagger, this tagger incorporates more contextual information into a lexical entry. Moreover, an error-driven learning approach is adopted to decrease the memory requirement. It keeps only positive lexical entries which contribute to the error reduction. Thus it is possible to further incorporate more contextdependent lexical entries and improve the performance. Finally, memory-based learning is integrated to further improve the performance of the chunk tagger.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Text Chunker and Hybrid POS Tagger for Indian Languages

Part-of-Speech (POS) tagging can be described as a task of doing automatic annotation of syntactic categories for each word in a text document. This paper presents a generic hybrid POS tagger for Indian languages. Indian languages are relatively free word order, morphologically productive and agglutinative languages. In this hybrid implementation we have used combination of statistical approach...

متن کامل

تعیین مرز و نوع عبارات نحوی در متون فارسی

Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...

متن کامل

Text Chunking by Combining Hand-Crafted Rules and Memory-Based Learning

This paper proposes a hybrid of handcrafted rules and a machine learning method for chunking Korean. In the partially free word-order languages such as Korean and Japanese, a small number of rules dominate the performance due to their well-developed postpositions and endings. Thus, the proposed method is primarily based on the rules, and then the residual errors are corrected by adopting a memo...

متن کامل

Enrichir et raisonner sur des espaces sémantiques pour l'attribution de mots-clés (Enriching and reasoning on semantic spaces for keyword extraction) [in French]

Enriching and reasoning on semantic spaces for keyword extraction This article presents a multi-modular hybrid system for extraction of keywords from corpus of scientific articles. System is multi-modular because it integrates components executing transformations on 1) morphosyntactic level (lemmatization and chunking) 2) semantic level (Reflected Random Indexing), as well as upon more 3) « pra...

متن کامل

Chunking Clinical Text Containing Non-Canonical Language

Free text notes typed by primary care physicians during patient consultations typically contain highly non-canonical language. Shallow syntactic analysis of free text notes can help to reveal valuable information for the study of disease and treatment. We present an exploratory study into chunking such text using offthe-shelf language processing tools and pre-trained statistical models. We eval...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000